Automated Japanese grapheme-phoneme alignment

نویسندگان

  • Timothy Baldwin
  • Hozumi Tanaka
چکیده

This paper describes an adapatation of the tf-idf model to Japanese graphemephoneme alignment, without reliance on training data. The tf-idf model is optionally complemented with affixation and conjugation handling modules, and determines frequencies through analysis of “alignment potential”. The proposed system achieved a maximum accuracy of 94.74% on evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Grapheme-phoneme Alignment for Japanese

Current approaches to the grapheme-phoneme alignment problem for Japanese achieve good accuracy, but are extremely computationally expensive. In this paper we evaluate various modifications to previous algorithms for both the alignment and okurigana detection subtasks. The best algorithm achieved accuracy of 96.2% for the combined task on a limited data set, and was significantly more efficient...

متن کامل

The Applications Of Unsupervised Learning To Japanese Grapheme-Phoneme Alignment

In this paper, we adapt the TF-IDF model to the Japanese grapheme-phoneme alignment task, by way of a simple statistical model and an incremental learning method. In the incremental learning method, grapheme-phoneme alignment paradigms are disambiguated one at a t ime according to the relative plausibility of the highest scoring alignment schema, and the statistical model is re-trained accordin...

متن کامل

A Comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods

This paper describes and compares two unsupervised algorithms to automatically align Japanese grapheme and phoneme strings, identifying segment-level correspondences between them. The first algorithm is inspired by the tf-idf model, including enhancements to handle phonological variation and determine frequency through analysis of “alignment potential”. The second algorithm relies on the C4.5 c...

متن کامل

A Language - Independent , Data - OrientedArchitecture for Grapheme - to

We report on an implemented grapheme-to-phoneme conversion architecture. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in t...

متن کامل

PermA and Balloon: Tools for string alignment and text processing

Two online research tools are presented in this paper: PermA, a general-purpose string aligner which can for example be used for grapheme-to-phoneme and phonemeto-phoneme alignment, and Balloon, a text processing toolkit for German and English providing components for part-of-speech tagging, morphological analyses, and grapheme-to-phoneme conversion including syllabification and word-stress ass...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999